Learningtower: Comparative Analysis of PISA 2022 and Historical Data

Shabarish Sai and Guan Ru Chen

Department of Econometrics and Business Statistics

2024-10-18

👷Contributors



Shabarish Sai Subramanian

Guan Ru, Chen

Dianne Cook

Kevin Y.X. Wang

Priya Ravindra Dingorkar



Introduction

Learningtower Package





The Learningtower package provides access to PISA datasets from 2000 to 2022, allowing researchers to explore trends in education, student performance, and other contextual factors.

In Learningtower contain mainly 3 datasets:

  • Student is a dataset of students scores in mathematics, reading and science.
  • School is a dataset of school’s detailed information, i.e. school weight, school funding distribution, private/public sectors, etc.
  • Countrycode is a dataset of a mapping of a country/region’s ISO code to its full name.

What is PISA (Programme for International Student Assessment)?

Global Examination

Measures student performance in reading, math, and science

Target Group

Assesses 15-year-old students’ knowledge and skills

Global Reach

81 OECD member countries, 700,000+ students in 2022

Educational Environment Research

Additional questionnaires done by students, teachers, and school principals to gather contextual data on educational environments, socio-economic status, and more.

📒Methodology

  • Step 1: Downloaded original dataset from the PISA website.



  • Step 2: Data cleaning and wrangling with the appropriate script, the variables of interest were be re-categorised and saved into appropriate data type.



  • Step 3: Upload new datasets into package repo and run package checks to prepare for CRAN submission.



📚PISA Dataset: Student

year country school_id student_id mother_educ father_educ gender computer internet math read science stu_wgt desk room dishwasher television computer_n laptop_n car book wealth escs
2022 ALB 800282 800001 ISCED 1 ISCED 2 female yes NA 180 248 335 3.2 NA NA NA NA NA NA NA 101-200 NA 1.11
2022 ALB 800115 800002 ISCED 1 ISCED 2 male no no 308 258 315 4.3 NA no NA 1 0 0 0 11-25 NA -3.05
2022 ALB 800242 800003 ISCED 3A ISCED 2 male yes yes 268 285 359 7.8 NA yes NA 1 1 0 1 101-200 NA -0.19
2022 ALB 800245 800005 ISCED 1 ISCED 1 female yes yes 273 322 215 8.5 NA yes NA 1 1 0 0 11-25 NA -3.22
2022 ALB 800285 800006 ISCED 3A ISCED 2 female yes yes 435 464 435 3.7 NA yes NA 1 1 1 2 11-25 NA -1.05
2022 ALB 800172 800007 ISCED 3A ISCED 3A male yes yes 534 451 479 4.3 NA no NA NA 1 2 1 more than 500 NA 1.09

✍️Variable Description

From original dataset, We collect the following variables:

  • Year

  • Country

  • School

  • Student information: ID, gender, test scores, student weight

  • Economic factors: Parent’s education level, household belongings(i.e. computer, internet, etc.) as well as constructed index like escs.

Gender Gap Analysis

🔢Math

  • Singapore consistently performs at the top of Math Scores, while countries like Argentina and Morocco show lower scores.

  • Boys in most countries did better, while in Finland and Morocco, girls are better.

📖Reading

  • Singapore consistently shows high reading scores, other countries like Argentina and Morocco also show some decline.
  • Girls generally outperform boys in reading across all countries.

🔬Science

  • Singapore and Japan lead in science scores in both 2018 and 2022.

  • Finland and Saudi Arabia showing girls outperforming boys significantly.

🗺️World Map

EcoSocio Factors Analysis

👪Parent’s Education

Parents who have tend to have higher levels of education, their children are more likely to perform better in academics.

🧑‍💻Impact of Technology Assistance

All nations have higher scores in student performance when they own a computer and have access to the internet.

Temporal Analysis

📈Gender Gaps Across Subjects and Years

The gap has remained fairly stable over time, without significant changes. However, most of gaps became smaller from 2018 to 2022.

🔎Highlighting Key Countries

🛠️Limitations & Discussion



Size limitation on CRAN packages

The data size would be bigger if keep uploading the newest data, so further curation process of data should be considered, or explore alternative data compression for the datasets.

Variables Consistency

The construction of questionnaire would be different every survey, as well as the coding mechanism of the original dataset, so curation process must be examined everytime to ensure the consistency of variables.

Further Update

  • The Learningtower package dataset for 2022 is scheduled to be updated to the CRAN by next month.
  • A Rjournal paper according to the 2022 data would be published by Dianne Cook and Priya Dingorkar.



Thank You